A First Look at Reproducibility and Non-Determinism in CMS Software and ROOT Data
نویسندگان
چکیده
Reproducibility is an essential component of the scientific process. Including software and data with a published paper is a good step towards reproducible research. However, the presence of non-determinism in a scientific workflow can make validating results very difficult even between two runs on the same machine, the same day, and using the exact same command and parameters. But for reproducibility we should be able to validate results even when the environment changes, which is even more challenging. We explore three high level methods for dealing with non-determinism in general: 1) Domain specific methods; 2) Domain specific comparisons; and 3) Virtualization adjustments. Using a complex high energy physics workflow, we use these methods to prevent, detect, and eliminate sources of nondeterminism. We observe improved determinism using predetermined random seeds, hierarchical data comparisons, and predictable progression of system timestamps. Unfortunately, sources of non-determinism continue to exist despite the combination of all three methods. We conclude that there is room for improvement in all three methods, and identify directions that can be taken in each method to continue progress towards reproducibility.
منابع مشابه
An Analysis of Reproducibility and Non-Determinism in HEP Software and ROOT Data
Reproducibility is an essential component of the scientific method. In order to validate the correctness or facilitate the extension of a computational result, it should be possible to re-run a published result and verify that the same results are produced. However, reproducing a computational result is surprisingly difficult: non-determinism and other factors may make it impossible to get the ...
متن کاملCMS Analysis and Data Reduction with Apache Spark
Experimental Particle Physics has been at the forefront of analyzing the world’s largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems for distributed data processing, collectively called ”Big Data” technologies have emerged from industry and open source projects to support th...
متن کاملA Mathematical Model for Cell Formation in CMS Using Sequence Data
Cell formation problem in Cellular Manufacturing System (CMS) design has derived the attention of researchers for more than three decades. However, use of sequence data for cell formation has been the least investigated area. Sequence data provides valuable information about the flow patterns of various jobs in a manufacturing system. This paper presents a new mathematical model to solve a cell...
متن کاملChronic Mountain Sickness (Cms) Misdiagnosed As High Altitude Cerebral Edema (Hace) At Extreme Altitude (6400 M/21000 Ft)
Introduction: Chronic mountain sickness (CMS) represents a syndrome of secondary polycythemia along with thrombocytopenia, altered hemorheology, pulmonary and systemic hypertension, and congestive heart failure, occurring due to hypobaric hypoxia-anoxia-induced erythropoiesis reported in both native mountain residents and new climbers after prolonged stays at high and extreme a...
متن کاملInteractive Effects of Cadmium and Zinc Application on Their Uptake by Rice Under Waterlogged and Non-waterlogged Conditions
In order to investigate the effect of Cd and Zn on uptake, concentration and the translocation factor of the Cd and Zn in the rice plant, a factorial experiment was conducted with four factors including two rice cultivars of Vandana and Hashemi, two waterlogged and non-waterlogged conditions and three levels of Zn and Cd (0, 5 and 10 mg kg-1 soil). The experiment was carried out in a randomized...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016